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1 Introduction 



In the last couple of years the modeling of financial data observed at high-frequency became 
one of the major research topics in the field of financial econometrics. It is of high practical rele- 
vance because a rising number of market participants execute trades based on high-frequency 
strategies and are exposed to high-frequency market risk. Examples of those trading strategies 
are statistical arbitrage, the execution of large block trades, and market making. For most strate- 
gies the spot volatility is important for trading signal generation and risk management. Often the 
immediate detection of sudden volatility movements is particularly relevant for traders. Usually 
high-frequency trading strategies are highly automated. In fact, they are often "speed games" 
and only profitable if one reacts to market changes faster than other market participants. In a 
high-frequency setting the estimation of spot volatility is much more complicated due to the pres- 
ence of market microstructure noise. Overall, this causes the need for an on-line spot volatility 
estimator which filters out market microstructure noise and adapts to volatility movements quickly. 
In addition, it is required to be computationally efficient. We propose an estimation method which 
satisfies these requirements. 

In our method described below, the efficient log-price process of a security is treated as a la- 
tent state in a nonlinear state-space model. The relation between the efficient log-prices and the 
transaction prices is described by a new class of nonlinear market microstructure noise models 
leading to a particular form of the observation equation in the state-space model. A computation- 
ally efficient particle filter is developed which allows the estimation of the filtering distributions of 
the efficient log-prices given the observed transaction prices. Based on the filtering distributions 
the time-varying volatility is estimated by using a new sequential Expectation-Maximization (EM) 
algorithm. Our procedure works on-line and updates the volatility estimate immediately when 
a new transaction comes in. The method is suitable for real-time applications because of its 
computational efficiency. Contrary to several other papers we do not assume that the transaction 
times are equidistant nor do we use interpolated prices. 

Until recently, the main focus in the literature has been on the estimation of the integrated 
volatility. This task has been studied extensively under various assumptions on the market mi- 
crostructure noise (Zhou 1996; Zhang et al. 2005; Andersen et al. 2006; Bandi and Russell 
2006, 2008; Hansen and Lunde 2006; Barndorff-Nielsen et al. 2008; Kalnina and Linton 2008; 
Christensen et al. 2009; Jacod et al. 2009; Podolskij and Vetter 2009). Some authors suggested 
that estimates of the spot volatility can be obtained through localized versions of estimators for 
the integrated volatility (Harris 1990; Foster and Nelson 1996; Zeng 2003; Fan and Wang 2008; 
Bos et al. 2009; Kristensen 2009) or by (Fourier-) series methods (Munk and Schmidt-Hieber 
2009). However, these methods are essentially off-line procedures. 

In this article, transaction data are treated as noisy observations of a latent efficient log-price 
process X t . We assume that transaction prices Y tj are observed at times h < t 2 < . . . < t T - 
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The evolution of the efficient log-price process is modeled by a random walk in transaction time 
with possibly time-varying volatility a tp that is 



X tj =X t . 1+ Z tj (1) 



3.4 



Drift 



with Z t . ~ JV(O,0f ), or alternatively by a diffusion model in clock time - see Section 
terms are ignored because their effect is of lower order with high-frequency data. 

We make a distinction between volatility per time unit and volatility per transaction and provide 
estimators for both. We start with a model in transaction time instead of clock time leading to an 



estimator of the spot volatility per transaction. In Section \3A\ a transformation from transaction 
time volatility to clock time volatility is given leading to a subsequent estimator of the volatility per 
time unit. In addition, we give a direct clock time estimator. In our opinion a model in transaction 
time has at least two advantages: First, we are convinced that the returns of the unobserved 
(log-) efficient price in a transaction time model can be modeled in most situations quite well 
by a Gaussian distribution (we believe that the often observed "jumps" in asset prices are for 
high-frequency data often due to an increase of the trading intensity and not to a heavy tailed 



distribution - see sections |3.4[ |5.2| and [6}, and second, volatility in transaction time is more 



constant than volatility in clock time making the algorithm more stable (Ane and Geman 2000; 
Plerou et al. 2001; Gabaix et al. 2003 - see also Section [5^2| . We emphasize that these 



modeling issues need a deeper investigation. Note that we assume a Gaussian distribution for 
the unobserved efficient log-prize while the observed prize is not Gaussian but usually discrete. 

The relation between the unobserved efficient (log-)prices and the observed transaction prices 
is described through a general nonlinear market microstructure noise model which is completely 
different from the models considered so far. It can be expressed through a nonlinear equation 

Y tj = g tj { exp[X tj )) = g t . ;Yti: ._ i {exp[X tj }) , (2) 

where the function g tj is a (possibly time-inhomogeneous) rounding scheme. A simple example 
is the rounding of exp[X t .] to the nearest cent or the situation where the next trade is made 
with probability 1/2 on the closest bid- and with probability 1/2 on the closest ask-level of an 
order book. In Section [2] we give several examples. Usually the rounding also depends on 
past observations Y tl[j _ 1 := {Y tl ,. . ..Y^^} or exogenous variables such as order book data or 
market maker quotes. 

The state equation ([T) and the observation equation ^ form a nonlinear state-space model 
(see also ^ and ^). The spot volatility curve is considered as a parameter of this state-space 
model. The estimation is done through a particle filter and a new sequential EM-type algorithm. 
Very roughly speaking our volatility estimator can be viewed as a localized realized volatility 
estimator based upon the particles of the particle filter. In detail the situation is however more 
complicated because we need a back and forth between particle filter and volatility estimator to 
obtain a decent on-line estimator. 
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The article is organized as follows. Section [2] describes the nonlinear market microstructure 
noise model. In Section [3] a particle filter and a sequential EM-type algorithm are proposed for 
on-line estimation of spot volatility in transaction time. Furthermore, the relation between clock 
time volatility and volatility in transaction time are sketched and two estimates of spot volatility in 
clock time are given. A description of the implementation of our algorithm is given in Section |4| 
Finally, simulation results and an application to real data are presented in Section [5] followed by 
some conclusions in Section [6) 

In some parts of the paper we could replace the notation y t x tj by the simpler notation yj, xj 



etc. Since we need the former notation in Section [34] and at a few other places we decided to 
stick with this notation throughout. 

2 Nonlinear Market Microstructure Noise Models 

In most existing market microstructure models the efficient log-price is assumed to be corrupted 
by additive stationary noise (ATt-Sahalia et al. 2005; Zhang et al. 2005; Bandi and Russell 
2006; Hansen and Lunde 2006; Barndorff-Nielsen et al. 2008). The noise variables are typically 
independent of the efficient log-price process. The major weakness of these models is that 
they cannot reproduce the discreteness of transaction prices. More adequate models which 
incorporate rounding noise have also been considered (Ball 1988; Large 2007; Li and Mykland 
2007; Robert and Rosenbaum 2008; Rosenbaum 2009). Popular models are based on additive 
noise followed by rounding according to the smallest tick size as in Q. We discuss these models 
in relation to our model below. Rounding observation noise is also an issue in many other fields 
such as engineering. An example are roundoff errors due to the word-length restriction which 
have also been discussed within state-space models (c.f. Hinamoto et al. 2003; Renfors 1983). 

Our model is based on the idea that the observed price is obtained from the unknown effi- 
cient price by means of a generalized rounding function as in |2). This rounding may be time- 
inhomogeneous and done in different ways - examples are rounding to the nearest cent, rounding 
to the closest liquid level of an order book (cf. Example 2 below), rounding to the quotes created 
by a market maker (cf. Example 3) or to some levels estimated from previous observations Y tl[j _ 1 
(cf. Example 4). In general these levels are created from additional exogeneous information or in 
an unspecified way from previous observations yt Xlj _ x (e.g. the levels in an order book). In most 
cases we assume that (conditional on yt x . 4 _ x ) these levels are known. Example 4 is an example 
where such exogeneous information is unavailable and the levels are estimated. 

Model assumption 1 (observation equation / microstructure noise model): 

(i) The distribution of Y t . = fl^-^ .,_( exp[X t .]) is discrete. 

(ii) The conditional distribution of Y tj given the state X tj and previous observations is of the form 

p(y tj \yt 1: ^ 1 ,exp[x t:j }) (xl Atj (exp[xt 3 }) a.S. (3) 
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where the set A tj depends on y t . and on the conditioning observations y tl , . . ^yt^^- "a.s." 
means almost surely with respect to the distribution of X tj . 



It is important to note that the concrete specification of the set A tj is an important part of the 
microstructure noise model at hand. In Examples 1-4 we shall give different options how to 
specify A t . in different situations. Note that we need to know A t . only for the observed y tj 
and not for all possible realizations of Y t .. If the function g tj = gt ] -Yt 1 . j _ 1 is deterministic then 
A tj ■= Qt^Vtj) = i z '■ 9t 3 {z) = y tj } is the inverse image of y t . under g tj . The "a.s." will be omitted 
in the rest of the paper. In particular Proposition Q] continues to hold if f3} only holds almost 
surely. 

Reasons for choosing the model: We chose the above model for three reasons: 

(a) The particle filter takes a simple form: The optimal proposal becomes a truncated normal 
distribution and the importance weights are also easy to calculate (see Proposition 1). This 
means that the filter is much more efficient than in the general case and less particles (and less 
computation time) are needed. 

(b) The model covers many important cases of microstructure noise (see the Examples below). 
We advocate in particular the deterministic rounding in Example 1 (with its more sophisticated 
variants in the subsequent examples) which describe in a sufficient way several stylized facts of 
high-frequency data - namely the discreteness of prices, the bid-ask bounce and time-varying 
bid-ask spread and the form of autocorrelations and partial autocorrelations of real log-returns 
(see the simulation example at the end of this chapter). 

(c) Our model is more parsimonious than other models. Popular models with rounding which 
have recently been considered (not in combination with particle filtering) are 

Y tj = round (exp[X % ] + U t .) or Y t . = round(ex P [X tj + U tj ]) (4) 

with a simple rounding function (e.g. to the nearest cent) and for example i.i.d. Gaussian U t] . 
It is obvious that these models also do not describe perfectly the underlying mechanism. The 
drawback of these models is that it is necessary to estimate the variance of the noise variable U tj 
while in our model there is no need for estimation of further parameters. We show in the remark 



at the end of Section [3T| that these models can also be handled with the methods of this paper 
if the modified state variable X tj = {X tv U tj )' is used. 

Identifiability and approximation of the filtering distribution: (|9j) and |3| imply that the joint filter- 
ing distribution p(x tl:] \y tl ..) is uniquely defined under the model assumptions. A tj and \ogA tj 
are the supports of the filtering distribution of the efficient price p(ex.p[x tj ]\yt 1:j ) and the efficient 



log-price p(x tj \ yt 1:j ), respectively. It will be shown in Section 3.1 that the filtering distributions can 
be approximated through a particle filter. A real data example is given in Figure Q] It shows the 
supports A t (gray vertical lines) and kernel density estimates of the filtering distributions of the 
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Figure 1 : A real data example of estimated filtering distributions based on our market microstructure noise model 
with deterministic rounding for the c ase when market maker quotes are available in addition to the transaction data. 
The details are provided in Section [572] The plot shows some transaction prices (circles) along with kernel density 
estimates of the filtering distributions of the efficient prices (black lines) based on the particles produced by our particle 
filter. The gray vertical lines indicate the assumed support of the filtering distributions. The bid and ask market maker 
quotes are displayed by gray and black horizontal lines, respectively. The x-axis shows transaction time. 



efficient prices (black lines) which are computed based on the output of the particle filter. In this 
example, market maker quotes are available (see Example 4 below). The details of this example 



are provided in Section 5.2 



For completeness we give the state equation again. The reasons for choosing this model for 
the efficient price have already been given in the introduction. 

Model assumption 2 (state equation / efficient price model): 

The unobserved efficient price is given by exp(x tj ) with 



P (x t:j ) = M(x tj | x tj _ x ; of ) , 



(5) 



of is assumed to be either constant or slowly varying in time, that is we assume some smooth- 
ness for of.. 

The smoothness assumption does not need to be specified any further because we do not use 



it formally. However, without this assumption the estimation procedure developed in Section [3^2 
would not make sense. 

Modeling microstructure noise via the specification of A tj : In order to carry out the particle filter 
and the volatility estimate described later we have to specify for the observation y tj at hand the 
set A tj i.e. the set of the possible efficient prices. This specification is an important modeling 
step. We now give examples. 

Example 1 (simple deterministic and stochastic rounding): 

(i) The simplest example is the rounding of exp[x tj ] to the nearest integer (say cent) - i.e. y tj = 

round(exp[x t J). In this case A t . = [y tj -0.5,y tj +0.5) andp(y tj .|y tl:J ._ :L ,exp[a; t J) = l At . (expfoj). 

(ii) A simple stochastic example is where we choose for exp[x t .] e (n,n + 1) the values y t = n 
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and y tj = n + 1 each with probability 1/2. In that case A tj = (y^ - 1, y^- + 1) and 
p(ytj|yti:j-i,exp[a; tj .]) = ± l^(exp[x^]) for almost all x tj .. It seems natural to set y tj = n for 
exp[x 4j ] = n but with order book data as in Example 2 below this choice is no longer natural. 

Surprisingly, it turns out that deterministic rounding describes better the empirically observed 
behavior of transaction prices than stochastic rounding (see Figure|3]and the discussion in Exam- 
ple 2). For this reason we use throughout this paper different variants of deterministic rounding. 

We now give some examples where we model the bid-ask spread of financial transaction data. 
In these cases A tj depends on past observations and/or exogenous data. 

Example 2 (order book data): 

Let's assume that at each transaction time tj the exchange provides a limit order book with bid 
and ask levels given by of. (k = l,2,...,K) and f3f. {£= 1, 2, . . . , L) respectively (these are the 
levels where contract offers are really available). The order book levels satisfy < . . . < aj. < 

a}. < < < ... < fit. and we denote 

M tj - := {<*£,..., a\ , a\ , , /?f. ,...,/?£}. 

M tj - represents the state of the order book immediately before the transaction at time tj occurs. 
Mt r depends in an unknown way on the past observations l and exogenous information. 
Clearly y tj e M tj - . We now set, corresponding to the deterministic case (i) in Example 1 

A t . :={zgR: argmin y6Mtj _|* - 7I = ytj (6) 

or equivalently 

9t r ,y H:j Jz) ■■= argm\n ieMtr \z - 7 | . 

Thus the transaction price at time tj is that price from M tj - with the smallest Euclidean distance 
to the efficient price. This means that the efficient price at time tj is assumed to be closer to the 
observed price y tj than to any other order book level. Of course, this cannot be guaranteed and 
it seems to be more realistic to choose for (say) 7 e (a|.,$.) yt 3 = aj. and y tj = $\. each with 
probability 1/2 - i.e. a trade is made with probability 1/2 on the bid and on the ask side. This 
corresponds to the stochastic case (ii) from Example 1 leading to the definition A t] := (largest 
level from M tj - below y tj , smallest level from M t] - above y tj ). Figure [§]however reveals that the 
model with A tj as in ([6| (deterministic rounding) much better captures the stylized facts of real 
transaction data. The explanation may be that in the case of a liquid order book often several 
trades are executed at the same level and therefore the first model gives a better fit. The situation 
may be different if the stock is less heavily traded - however we have not investigated that. 
In the present situation we could better write instead of ^ 

p(%|^t r ,exp[a; tj ]) oc Ij^. (exp[x tj .]) a.S. 

where M tj - contains implicitly the relevant information from yt V] _ x - 
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Figure 2: An example of the market microstructure noise model with deterministic rounding for the case when order 
book data are available. The figure shows the transaction prices (circles), the (in practice unknown) efficient prices 
in transaction time (diamonds), the latent efficient price process in clock time (black line), the order book levels (gray 
horizontal lines), and the supports of the filtering distributions of the efficient prices (gray vertical lines). 

If the volume of the trade at time tj is so large that it is executed on several levels of the order 
book then y t . should be set equal to the largest ask level (smallest bid level) and all lower levels 
should be deleted before determining A tj . 

An example of this market microstructure model is visualized in Figure|2} The intervals A tj are 
denoted by thick vertical lines. Note, that these are also the supports of the filtering distributions. 
Larger intervals A t . are usually due to a larger bid-ask spread. 

Example 3 (market maker quotes): 

In case where market maker quotes are available instead of order book data, we only have a 
single bid and a single ask level a tj and /3 tj , respectively, which satisfy a tj < fit y That is, y tj is 
either equal to a tj or equal to /3 tj . Corresponding to the deterministic case in Example 1 we set 

A tj = [y tj - A t j, y t j + A tj ) 

where A t . := 0.5 (/3 tj -a tj ). 

From a certain point of view this choice of A tj seems to be not adequate and one is tempted 
to chose 

A = f (-oo, at,+ A ti ), \\y t j = a tj 

tj I [A 3 -A t ,,oo), \iytj=Ptj 
To understand why this is not a proper choice one needs to look in more detail at the behavior of 
the market maker. Of course the market maker has more (invisible) levels which are automatically 
executed at the same time if the efficient price makes larger jumps. Furthermore, the market 
maker has additional information on the efficient price (say from trades of correlated securities) 
and may have already adjusted his levels towards the efficient price. This last fact violates 
our model assumptions (in that the function g tj not only depends on past values yt x ,j^ and 
exogenous information but also somehow on exp[x tj }) but in particular in this situation our model 
with the above choice of A tj seems to be a reasonable parsimonious model. 



This example also demonstrates the advantage of the fact that we just have to specify the 
inverse image A t for the y t , at hand. 
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Figure 3: Comparison of real transaction data for Citigroup (left column) with simulated data from the market 
microstructure noise model with deterministic rounding (middle column) and stochastic rounding (right column). The 
plots show (from top to bottom): 10,000 transaction prices; the first 250 transaction prices and the efficient price 
process of the simulated data; the autocorrelations and the partial autocorrelations of the returns of the transaction 
prices. 

Example 4 (transaction data only): 

We now consider the situation where no order book data or market maker quotes are available. 
In this case we try to estimate the order book levels from the data and use afterwards the A tj 
from Example 2. More precisely we estimate half the bid-ask spread at time tj by 



A, 



0.5 \y t . 



in 



3-1 I 



if ytj + ytj-i, 
else 



and set 



At 



[Vti ~ A t , y tj + A t 



Surprisingly, this specification does not belong to a deterministic but to a stochastic mapping g t . 
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(it is not difficult to see that the same x tj lies in A tj constructed from different y tj - hence g t . must 
be stochastic). The mapping g tj becomes deterministic (conditionally on j/t 1:j -_ x ) if one replaces 
A tj by At,-!- However A tj _ 1 obviously is much worse than A tj as an estimate of the bid-ask 
spread at time tj - so from a practical point of view the above specification is to be preferred. 

We finally demonstrate in a simulation example that our model reproduces the autocorrelations 
and partial autocorrelations of real log-returns. In addition we compare the deterministic and the 
stochastic rounding from Example 1 . In Figure ^transaction data of Citigroup are compared with 
data simulated from our model with the two different rounding schemes from Example 1 (i) and (ii), 
respectively. The figure shows the simulated efficient and the observed prices. The efficient log- 
prices were generated according to ([[} such that the observations have approximately the same 
volatility as the Citigroup data. We emphasize the large number of zero returns in the true data 
which are reproduced by the simulated data with deterministic rounding. The important point, 
however, is that our market microstructure noise model with deterministic rounding automatically 
introduces autocorrelations and partial autocorrelations of the log-returns which are very similar 
to those of the real Citigroup data. 

For this reason, the price discreteness, and the fact that our model covers bid-ask bounces and 
time-varying bid-ask spreads we believe that our model better describes the real world market 
microstructure than most existing models. 

3 On-Line Estimation of Spot Volatility 

We now present on-line algorithms for the estimation of the spot volatility. Because all results 
also hold in the multivariate case with synchronous trading times we formulate this section for 
multivariate security prices. We are aware of the fact that the main challenge in the multivariate 
case are non-synchronous trading times. The presented results are, however, the basis for future 
work on non-synchronous trading. 

We therefore consider in this section the estimation of the covariance matrix E tj which gives 
the volatilities of the individual efficient log-price processes X. t = (X t> x, . . . ,X t ^) T as well as their 
cross-volatilities. The algorithms for the spot volatility are obtained by setting E tj = of.. The 
multivariate version of the nonlinear state-space model |3) and ^ is given by 

Kytj|y*i :J -i' ex P[ x tjD K ^.(expfxtj) a.s., (7) 
PKK-i) = -^(xt-lx^;^.), (8) 

where X t = (X t< i, . . . ,X t ,s) T and Z tj ~7V(0,S ti ). The set A tj usually is of the form A tj = 
A tjj i x • • -xA tjtS - We assume that model assumption 1 holds for all components. For simplicity we 
assume as an initial condition that given Y tl]S = yt U s the efficient prices ex.p[X tlyS ] are uniformly 
distributed on A tl s . 
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We remark that and ^ constitute a slightly generalized state-space model because the 
observations Y tj are not conditional independent of Y tl:j ._ 1 given X tj as in standard state-space 
models. However, this is a standard extension which does not cause any difficulty for estimation. 

Our objective is the estimation of the covariance matrix E tj based on the observed prices 
y tl .. Because of the nonlinear market microstructure noise this is difficult. It is well known 
that crude estimators that ignore the noise lead to severely biased estimates (see, for instance, 
Voev and Lunde 2007). The idea of our estimation procedure is to approximate the conditional 
distribution of the efficient log-prices X t . given all observed transaction prices y tl[j up to time tj 
by an efficient particle filter. Based on this approximation a localized EM-type algorithm is used 
to construct an estimator of £ t . 

3.1 An Efficient Particle Filter 

Particle filters are sequential Monte Carlo methods (Doucet et al. 2001) that approximate the 
posterior distributions p(x tl;j |y tl;j .) with clouds of particles {x| x ,u\.}f =x . A particle consists of a 
sample xj and an associated weight wj . The particle approximation of the target distribution 
is given by 

N 

P(*t 1:j \yt 1:j ) « >*!:,•)> 
i = l ' J 

with 5 being the Dirac delta function. A particle filter generates particles sequentially in time 
making use of the relation 



p(y tj ,x t \y t ) p(y* |y t ,x t )p(x t |x t , 
P(*t 1:j Yt 1:i ) = t p \ — = t r 2 r 1 — P x 4l y<1 9 

p{yti y*iy_i) p(yt,- yti-j-J 



and a general sampling technique known as importance sampling. Importance sampling is 
necessary because direct sampling from (|9) is not feasible. In standard state-space models 
p(ytj|yti:,-_i! x %) further simplifies to p(y tj \x. tj ). As a result of the violated conditional indepen- 
dence property mentioned earlier, this is not the case here. 

In each iteration of the particle filter samples are drawn from an importance sampling distri- 
bution called proposal. Subsequently, the samples are weighted such that they approximate the 
target distribution. The choice of the proposal is crucial for the efficiency of the filter. In our frame- 
work it is possible to sample from the proposal p(x t .\y tl:j ,x t .^ 1 ) which is the optimal proposal 
in the sense that it minimizes the variance of the importance sampling weights (Doucet et al. 
2000). The algorithm can be stated as follows: Assume that weighted particles {x^ , w| }^ 
approximating ptx^Jy^J are given; then 
• For i = 1, . . . ,N: 

- Sample ~p(x^|y tl!j ,xj i _ 1 ). 

- Compute importance weights 



pCyiilytitf-nxUKxLta*. , 
u h « rr~i 1 — \ =- - 
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• For % = 1, . . . ,N: 

- Normalize importance weights w\ = &ij/(Y,k=i&tj)- 

• Obtain particles {x^ ,uj\ }f =1 which approximate p(x 4l Jy tl:j ). 

It is well-known that this algorithm suffers from weight degeneracy which means that after some 
iterations only few particles will have significant weight. This issue can be resolved by introducing 
a resampling step that maps the particle system {x^..,^.}^ onto an equally weighted parti- 
cle system {xj , l/N}f =l . Because resampling is time-consuming, it is carried out only if the 
effective sample size 

ESS(K.}f A = N 1 . 

is below some threshold (Kong et al. 1994). Other resampling schemes are discussed in Douc 
et al. (2005). 

To apply this particle filter to the state-space model given by {7} and ^ it is necessary to 
specify the optimal proposal and the computation of the importance weights. The following result 
shows that both take a very simple form. 

Proposition 1 . The optimal proposal is a truncated multivariate normal distribution given by 

p(xt. |y tl: . , x t oc TV (x t . | Xi j _ 1 ; S t . ) | log A ^ 
with log A t . = log A tjj i x • • • x log A tjt s and the importance weights can be computed through 

4*<4_ a / AA(x tj |xj ;S tj )dx t .. (10) 

Proof: It is easy to verify that the optimal proposal satisfies 

p( x % \yt 1:j , xt^J oc p{y tj |yt 1:j -_ a , x 4j ) p(x tj |x t ^_ 1 ) . 

leading together with and (|8]) to the assertion. The expression for the importance weights 
follows from 

pWyti:,--!, = /p(y%|yti:^i> x ^)pKlx|._ 1 )dx t . = / ptxtjxj.^)^. 

Remark: For the market microstructure noise models @ the same methods can be applied 
after a modification of the state variable. Consider in the univariate case for example Y tj = 

round(exp[X t . + U tj }) and let A tj = [y tj - 0.5,y tj + 0.5). 

Consider the corresponding state space models with state variable X tj = (X tj ,U tj )'. Then 
the optimal proposal satisfies 

p{xt 3 \yt 1:j , 5:%-! ) oc p(y tj |yt 1:J _! , x tj )p{x tj Ixt^J 

= p(ytj bt 1:f _i , x t j)p(xtj \x t ^ 1 )p{u t] ) 

= p{x tj \x tj _ 1 )p{u tj ) lA tj {ex-V[x t j + u tj \) 
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// 



which can be used easily to sample the particles in the filter step. The importance weights are 
given by 

p(ytj \ytuj-i , ^t,_! ) = Pis/tjlvti..,-! , x l.J 

l 1 o gi , j (x« j+ »,,p(x tjK ._>K)^^ 

llog A tj (x tj + u tj ) M{x tj | x\ i _ 1 ; of. ) M(u tj \0;afj) dx tj du t] 
which are however more difficult to compute (compare Section [4}. 

3.2 A Sequential EM-Type Algorithm 

In this section, we discuss the estimation of T, tj in the time-constant and time-varying case. 

A stochastic EM algorithm can be used to obtain the maximum likelihood estimator in the 
time-constant case E tj = £ (Dempster et al. 1977). The EM algorithm maximizes the likelihood 
PT,{yt 1:T ) by iteratively carrying out an E-step and an M-step. In the E-step, the expectation 

Q(S|S( m )) = E ±im) [logpx(X tl:T ,y tl:T )\y tl:T ] 

T 

= ^E £(m) [logp^lyt^.^XtJIy^] + E £(m) [logp(X tl )|y tl:T ] 

3=1 

T 

+ Efi(m) [ logPE(X t , IX^.JIyt^] (11) 
needs to be approximated, where t,^ is the current estimator. Note, it is sufficient to consider 



the sum in (11 because the random variables logp(ydy t _ i; X t ) and p(X tl ) do not depend 



on S. In the M-step, a new parameter estimate £( m+1 ) is obtained by maximizing Q(S|S( m )). 

If S tj . is time-varying some regularization is needed. For example S^ m+1) can be obtained by 
maximizing some localized version of (TT) , e.g. 

St,(S|s£) = ^ £ M^)V° [logp E (X t ._JX,._ fc _ 1 )|y tl:T ] (12) 

with a kernel if (•) and a bandwidth 6. 

An approximation of Q(S|S (m) ) and Q%(S|S^) can be computed based on the smoothing 
particles 

{ X Lt' u t T }i=l 

from our particle filter or (with higher precision) from existing particle smoothing algorithms (God- 
sill et al. 2004; Neddermeyer 2010; Briers et al. 2010). The smoothing particles give the approx- 
imation 

E s il:T (X* ._ h | X* ._ fc _J |y tlsT ] 

N 

i=i 
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which leads, with 



to the maximizers 



and 



N 



S % (^ T ):=^<(x^-4_ 1 )(4-x^_ i ; 



i=l 



^(m+1) 



T 



1 T 

3=2 



t +1) = [E<^ 1-1 



(14) 



(15) 



(16) 



of ([M) and (|12j), respectively (note that the particles and, therefore, also E depend on m.) 

Instead of these estimates, one will prefer in most situations an on-line algorithm which up- 
dates the estimates when a new observation comes in. This requires on the one hand the use 
of filtering particles instead of smoothing particles and on the other hand an integration of the 
E-step into the algorithm. 

We now develop such an algorithm step-by-step. Note that the recursion developed in 1) 
below is not an on-line algorithm. It is just discussed to demonstrate the relation of the on-line 
algorithms in |25) and f26} to the estimates (Q5) and (Q6), respectively. Note, in the following 
steps the notation %. is used for different estimates. 

1) A "recursive" solution for the above situation (both for time-constant and time-varying E 4j .) is 

Q % (E|E tl:T ) := {1 - Xj} Qt^mtt^) + Xj [logp E (X % |X t ,_ x )|y tliT ] (17) 

with Q t2 (E|E tl:T ) = [logp s (X t2 |X tl )|y tl:T ] leading to 

j ' — 3 k—1 

Q tj (nK.r) = E [iL 1 " X i-*)) X i- k E ^. T [ lo g^(X^._ fc |X t ._ fc _ 1 )|y tl:T ] 
A:=0 e=o 

+ [II^ 1 - x i-*)\ E s t , T [ lo g^(X t2 |X tl )|y tl:T ]. (18) 

i=o 1,T 



With the "constant parameter setting" Xj := - 1) , where ELLoU ~ x j- 
gives the classical (quasi-) likelihood 

3-2 



Xj-k — jzx- this 



J 



1 

ZT\ EEs^t^gPsCX^JX^^JIy^], 

fc=0 



that is ( fTT) for j = T. Furthermore, the maximizer of (18| is, with the smoother-approximation as 
in {T3} and f, t .(oj tT ) as in (14 , given by 



3 -3 fe-i 



.3-3 



= J2 [ JJ(1_ A ,_,)] \j_ k t t ._ k {u tT ) + [ JJ(1 " X 3-i)\ KM 



(19) 



fc=o 1=0 

This can be written as the recursion 

■A(m+1) _ 



1=0 



{l-A^E^+A^K 
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with = E t2 (oj tT ). Again, we obtain with the "constant parameter setting" Xj -.= - 1) 

that E^ m+1) coincides with the estimate in ^ for j = T. 

2) On-line algorithms: The above algorithm is not an on-line algorithm because the conditional 



expectation in (Q7| depends on all observations. Therefore, we replace the conditioning set 
of variables {y* 1:T } by {y tlJ } meaning that we pass from the smoothing distribution to the 
filtering distribution. More precisely, |"logp s (X ti t |X t , ,)|y tl J is replaced by 
|"logp s (Xf |X t )|y 4l 1 (we need at this point an estimate for £ t - see the comment 
at the end of this section) leading to the on-line algorithm 

Q^(>:|>: i|ti ,) := {1 - Xj} Qt^ntt^) + A, E^.^ [lo gPE (X t . IX^JIy^,] (20) 



with Qt 2 (S|E tl ) = Eg t [logpE(Xf 2 |X tl )|y tl:2 ]. (18 holds analogously and we now obtain anal- 
ogous to (Q9) the estimate 



i-3 k-i 



J-3 



E [lI( 1 - A ^)]VfeSt,_ fe K._J+ [IJ(I-A^)] S* 2 K) (21) 

fc=0 £=0 £=0 

now with 

AT 

: = E4K - x U)K -4-J T t 22 ) 

i=l 

based on the filtering particles {xi^.., ^ilili- This estimate can be obtained from the on-line 
recursion 

St. = {1 - A,} + Aj S^KO with ± t2 = t t2 (cj t2 ). (23) 

Observe that the estimated covariance matrix %. is positive semi-definite by construction. 

The new parameter estimate t, tj is used afterwards to calculate the next filtering particles 
and their weights {^. +1 ,oj\. +1 }f =l followed by the calculation of £ tj+1 via another application of 
( |23| etc. In contrast to the standard EM algorithm, our sequential variant therefore updates the 
covariance estimate (which in turn is used in the next step of the particle filter) in every time step. 
In the "new E-step", Qt^^Y,^^) is approximated through 



N 



1J 2 



2 XX [siog27r + log|£| +tr{s" 1 (xj j -xj._ 1 )(xj. - 4^f} 



A , — 

8=1 



3.2 



In the "new 



using the particles {y^ _ 1 ..,^\}f = i which are generated as described in Section 
M-step", the maximization of Qt^SjE^^J gives the on-line estimator defined in (123 . 

Note that % J .(u tj ) is not an approximation of the conditional variance Var(X tj -X ij _ 1 |y tl:j ) but 
an approximation of E((X^-X^_ 1 ) 2 |y tl:i ) (both are different because E(X^-X ti _ 1 |y tl: ^) / 0). 
AsaresultofE[E((X tj -X ij _ 1 ) 2 |Y tl;j )] = E(X t .-X t ._ 1 ) 2 = Var(X t .-X t ._ 1 ), E t . isadescent 
estimator of S^.. 
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3) Time-constant covariance matrices: If E 4j is time-constant the first idea is to apply the algo- 
rithm |23j with the "constant parameter setting" Xj = - 1) . However, the situation is different 
from the classical case in that the "old" estimate % ^ has in addition some bias due to the use of 
particles generated with an estimated covariance instead of the true one. Therefore we need to 
put less weight on the first term in |23]). The situation has been carefully investigated for a similar 
algorithm in the i.i.d.-case by Cappe and Moulines (2009). Following their recommendation we 
use in our situation the on-line algorithm 

t t , = {i - (j - in}iVi + U - ir^K) (25) 

with 7 e (g, 1)- Cappe and Moulines prove consistency and asymptotic normality of their esti- 
mate for weights Xj ■— A j~ 7 and 7 e (», 1) and also for 7 = 1 under some restrictions on A 
(Theorem 2). Furthermore, in their simulations it turned out that a value of 7 = 0.6 and A = 1 
has lead to good estimates. From our experience we prefer the choice 7 = 0.9 and A = 1 (see 
Figure[5). Even-Dar and Mansour (2003) obtained an optimal value of about 0.85 in a related es- 
timation problem. We emphasize that the choice of 7 needs more investigations - both theoretical 
and practical. 

4) Time-varying covariance matrices: If E tj is time-varying it is necessary to put more weight on 
recent observations. In this case, the traditional solution is to use the algorithms ([17}, |20) and 
|23) with time-constant Xj = X instead of a decaying Xj. That is we use in the time-varying case 

t tj = {1 - A} E tj _ 1 +Xt tj {u tj ) with ± t2 = ± t2 M (26) 

where the choice of A depends on the smoothness of the true volatility curve (cf. the relation 
to a kernel estimate below). To adapt locally to this smoothness one may either choose a time 
varying Xj anyhow (in some way dependent on the data) or use the SAGES procedure (see 
Section [4] below) where the algorithm is run simultaneously for K different values of A and the 
optimal estimate is determined at each step as a convex combination of these estimates. 

For a deeper understanding we stress the following heuristics: If Xj = X and tj =38 (e.g. 
S = ^) then we have with b := f for 5 ->■ 

[ha - v+H - a - ^ = &-ir~ !(.-*)" - 'At) < 27 > 



where K(x) := e~ x . That is Q t .(E|St 1:T ) from (20 is basically the kernel likelihood given in 
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with the one-sided exponential kernel, and E t . given by (26 > is basically the kernel estimate 



k k i=l 



3.3 Combining the Particle Filter and the Sequential EM-Type Algorithm 

To summarize our estimation method consists of three components: 



15 




10000 



15000 



Figure 4: Estimation of two time-varying volatility curves given by the dashed lines based on simulated data. 
Estimators: our estimator E tj (black line), our estimator combined with SAGES for adaptive step size selection Sf. 
(red line), and a benchmark estimator (gray line). For details see Section [B"T| 



(i) The state-space model with a new market microstructure noise model and the transaction 
time model for the efficient log-price (0 and |8)); 

(ii) A particle filter which sequentially approximates the filtering distributions of the efficient 
log-prices given the observed transaction prices (Section |37u) ; 



(iii) The on-line EM-type estimator H t given by (25 or |26k which estimates S t based on 



the particle approximation of the filtering distribution obtained from the particle filter (Sec- 
tion |3j2}. 

A key aspect of the method is the back and forth between the particle filter and the EM-type 
estimator. To propagate the particles from time tj to time t j+ i the particle filter requires an 
estimator of which we denote by sP f . A simple solution is to use S? f := S*. from the 

l J + l 1 tj + l ~ fj+1 l 3 

previous EM-type step. The EM-type estimator then in turn updates the covariance estimate 
based on the new particles for time t j+1 generated by the particle filter. 

Estimation results of our estimator £ tj and a benchmark estimator (see Section 
sented in Figure [4} Details and a discussion are given in Section [5T| 



5.1 



are pre- 
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3.4 From Transaction Time to Clock Time 

Clock Time Spot Volatility Estimation 

In the preceding section, we have derived an algorithm for the estimation of the covariance 
matrix T, tj = £(t,) in a transaction time model. If one prefers a clock time model all results of this 
paper continue to hold with some modifications. In this case one may consider as the underlying 
model the stochastic differential equation 

dX(t) = T(t) dW(t) where T(t) T T (t) = £ c (t) (28) 

and W(t) is a multivariate Brownian motion. £ c (t) is the volatility curve in the clock time model. 
Loosely speaking, it denotes volatility per time unit while £(t) denotes the volatility per transaction 
at time t. The relation between the two curves should be given by {30} (of course this depends 
on the mathematical definition of £(t) and £ c (t)). If we set X fj = X(t,) we obtain almost the 
same state space model as in {7} and {8} but with a modified variance of the transition distribution 
which is now approximately given by 

p(x*. (x^,) = AT(x t . |x t ._ i; \tj - tj-x\ T, c (tj)). 

This is the only change needed in the state-space model j7j), <|sj> . As an estimate ££. we can use 



the on-line estimates (25 and (26 but now with the update matrix (u tj ) replaced by 



V>c /. ,c \ \ , ,ci V 'j tj-xJ V tj tj-lJ IOO\ 

H K )- 2^ ^ u._ f . i (^ 9 ) 



i=1 I*j *j— 1 1 

based on the modified filtering particles {x^..,^}^. 

An Alternative Estimator for Clock Time Spot Volatility 

In the diffusion model {28} the spot volatility in clock time is 

E-W- lim !TlFM± _ lim Var(X( t + A t )-X(^ 

At^O At At^O At 

To clarify the relation to the transaction time volatility £(t) we assume that the transaction times 
tj are realizations of a stochastic point process with intensity function Aj(i) (transaction rate) 
which is independent of the efficient and observed prices. We then have 

Var(X(t + At)-X(i)) v £ i: t< t ,<t+At 
lim : = lim E 



At->o At At^o At 

= lim E| *W»* £fe> , l {j;t<t ^ t + At} l = E( t )A, W 
At^o |{j : t < tj < t + At}\ At w w 

that is 

S c (t) = S(t) A/(t). (30) 
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We stress that this is primarily a nonparametric relation ("variance per time unit = variance per 
transaction x expected number of transactions per time unit") and it depends on the underlying 
model whether this coincides with the definition of E c (t) and £(f) given in the model. A model 
which exactly leads to this formula is the subordinated differential equation dX(t) = T(t) dW N ^ 
with a point process N(t) with intensity A/(t) (cf. Howison and Lamper 2001). The unit of At 
(e.g. milliseconds) determines the unit of £(i) (e.g. variance per millisecond) and of the intensity 
(e.g. expected number of transactions per millisecond). An obvious estimate of the clock time 
volatility therefore is t c (tj) = %. x \{£ : tj - At < t e < tj}\ / At with some At. 

Here we advocate a different estimation method of the intensity function A/(i) which is closer 
related to our on-line scheme, namely the estimation of A/(t) by the inverse of the averaged 
duration times 5j defined by the recursion 

6j = (1 - Xj) Sj-i + Xj (tj - tj-i) with 5 2 = t 2 - t\ 

leading with (|21) to the alternative clock time volatility estimator 

% _ a Ei= (i - A) fe s tJ _ fc K_ fc ) + (i - xy~ 2 £ t2 {u: t2 ) 



S alt(*i) := 



h X Et 3 (l - X) k {tj- k - tj-fc-i) + (1 - A)i" 2 (t 2 - h) ' 

This estimator has a remarkable property: Because t, tl [oj tt ) ~ (tt - the estimator 

is of the form 

that is Sa| t (ti) is a weighted average of the S^(w^) and therefore also a decent estimator in 
the clock time model (the signs stem from the fact that in S^(w^) and E^(uf t ) two different 
particle filters are used - the effect of this is not clear!). Notice that the denominator t^ k - t j _ k _ 1 
in S£ (w£ ) cancels out - leading therefore to a more stable estimator (for example the sharp 
black peaks in Figures [7]and[8]are caused by small values of tj-k - ij-jt-i). 

The above argument contains a pitfall: While £(t) usually is smooth thus requiring a small 
value of A, the intensity of the point process Xi(t) changes considerably over time thus requiring 
larger value of A. For that reason we use different step sizes A for the estimators S 4j and Sj 
(compare Section [5T2]> . 



4 Implementation Overview 
The Algorithm 

The particle filter uses the following steps for j = 2, ... ,T (see Proposition Q) 
• For % = 1, . . . ,N: 

- Generate x|. from the optimal proposal jV(xi i |x|^_ 1 ;E^.)| lo At with = Ei^. 



- Compute the importance weight w|. as in < 1 . If S = 1 this is given by 

4 oc4_ 1 {$(suplog^.|4_ 1 ;Eg) -$(inflogA t .|4_ 1 ;SP ; ;)}. 
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For i = 1, . . . ,N: Normalize the importance weight u\ = ulJYjUi 

If the effective sample size ESS({w^}^ 1 ) < cN (with say c = 0.2), then resample the 
particles using, for instance, the residual resampling scheme (Douc et al. 2005). 



Update the estimator S t according to (25 or (26 



Overall the algorithm is very easy to implement in a few lines. It is computationally very 
efficient because the complexity of one iteration is linear in the number of particles N. In addition 
resampling is required only rarely because the optimal proposal is used. In our applications 
resampling was carried out only about every 15th iteration using a threshold for the effective 
sample size of c = 0.2. As a result of the efficiency of our particle filter, the number of particles 
N is not a critical quantity. Typically, about 500 particles suffice to achieve a sufficient precision 
(see Figure[5). 

Note that in the multivariate case the sampling from the optimal proposal and the evaluation 
of the importance weights is nontrivial. However, both the sampling from and the evaluation of 
a truncated normal distribution are standard problems in statistics which have been discussed 
extensively in the literature. Relevant references for the sampling problem are Geweke (1991) 
and Robert (1 995). More recent approaches based on Gibbs sampling are described by Kotecha 
and Djuric (1999) and Rodriguez- Yam et al. (2004). Also for the numerical approximation of 
multivariate (rectangular) normal probabilities several efficient methods have been proposed for 
instance by Genz (1992, 2004) and Joe (1995). 

Step Size Selection 

In the time-constant case we use the decreasing step size Xj = (j - 1)~ 09 as proposed in 



Section [3[2} This choice is empirically justified (see Figure [5]). 



The step size in the time-varying case is data dependent and can be obtained through the 
following procedure. The mean squared error of t, tj is minimized with respect to A by the cross- 
validation type criterion 

T-l 

crit(A):=^(S 4j -S tj+1 (o; 4j+1 )) 2 . (31) 
i=2 

This cannot be done on-line. In practice, one will use in an on-line setting a A from past experi- 
ence with similar data sets. The expectation of the above criterion is approximately 

T-l 

i=2 

Because the last term does not depend on A we correctly minimize the approximate mean 
squared error. 

Adaptive Step Size Selection Using SAGES 

To adaptively select non-constant step sizes Xj in the time-varying case we propose to use 
spatially aggregated exponential smoothing (SAGES) developed by Chen and Spokoiny (2009). 
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In our setting the SAGES method works as follows. The basic idea is to run K volatility estimators 
Ef. in parallel with different step sizes A 1 > A 2 > . . . > \ K . The resulting SAGES estimate E^ is 
then a convex combination of these estimators. In practice we have, say, K = 15 which implies 
that the computational offset is minimal. In fact, only the recursion ([26} needs to be computed K 
times with different step sizes. 

For every time step j the SAGES estimate Ef. is obtained from the estimators k = 
1,...,K, through the following recursion. 

(i) Set Eg 1 = £i. 

(ii) For k = 2,...,K: Compute 

yS.k ( lk_ 1 ~ Ik \ 

where 

with kernels K(u) = {1 - (u - 1/6)+}+ and /C(£, £) = -0.5{log(E/E) + 1 - E/E}. 

(iii) Obtain the SAGES estimate Ef. = t s t f. 

Note that this method can be applied completely on-line. The parameters k\, k 2 > • • • , K k-i are 
critical values (independent of the time step j) which can be calculated beforehand through a 
Monte Carlo simulation. Note that SAGES is a univariate method. For a more detailed description 
and a theoretical analysis of the SAGES method see Chen and Spokoiny (2009). 

Initialization 

Our experience from many data sets is that the algorithm stabilizes quickly provided that rea- 
sonable starting values are used - e.g. E t2 is chosen from prior knowledge or set to a rough 
initial estimate. The particle filter is started by simulating the x\ a such that the exp[x| 1)S ] are uni- 
formly distributed on A tuB . In order to exclude the effect of starting values we have used in the 
simulations (except from Figurejij the true matrix E 42 as the starting value (i.e. E|? 2 f = % 2 = E t2 ). 

5 Simulations and Applications 

5.1 Results for Simulated Data 
Estimation of time-constant spot volatility 

We first consider the estimation of time-constant spot volatility. An efficient log-price process is 
simulated from ti to t 50 oo with squared volatility equal to E t = 0.0001 2 . The initial efficient price 
exp[X tl ] is sampled from a uniform distribution on [50 - 0.005, 50 + 0.005). The transaction prices 
are obtained by rounding the efficient prices to the nearest cent (see Example 1 (i) in Section [2}. 
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Figure 5: Box plots for the estimation of a time-constant volatility based on simulated data (5,000 transactions). 
The estimator {25} is applied with different numbers of particles TV and different 7 and compared to the benchmark 
estimator and the optimal estimator (not available in practice). The box plots are based on 500 independent runs. 
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Our algorithm for time-constant spot volatility estimation ([25} is applied with different numbers of 
particles N and different values of 7. The initial value xj? 2 f = % 2 is drawn from a uniform dis- 
tribution on (0.00006 2 , 0.00014 2 ) which is quite uninformative. For comparison the results of two 
benchmark algorithms are also reported. The first benchmark method ("Benchmark" in Figure|5) 
is a recursive estimator with a simpler microstructure noise correction. It is related to the method 
in Zumbach et al. (2002) and it is based on the market microstructure model log It, = X t . + U tj , 
where the noise variables U tj are i.i.d. with Var U tj = rf. The recursive estimator is given by 

% := {1 - -^} + max{0, 2ff t + (logy tj - logy^f - max{0, 2f,l '.} (32) 

where ffi. ■- {1- ^vl^-jh ( logy^.-log^.J (log^-log^) (here ^ is used instead 
of -pj because the algorithm starts one time point later). The term max{0, 2r%.} corrects for the 
market microstructure noise. This follows from the fact that 



2 



Cov(logF tj -logKt.^logY^ -logY tj _ 2 ) = -j] 

The second benchmark method is, in some sense, the optimal estimator ("Optimal" in Figure[5j. It 
is unavailable in practice because it uses the latent efficient log-prices. It is computed analogous 
to d25l but instead of the particles it employs the efficient log-prices leading to 



f&* = {1 - (i - l)-"}f&i + (j - l)-"(x tj - x t 



The simulation results are given in terms of box plots which are obtained by 500 independent runs 
(Figure [5). The box plots suggest that our volatility estimator is asymptotically unbiased and that 
7 = 0.9 is a reasonable value. We can also conclude that about 500 particles are sufficient which 
makes our algorithm computationally efficient and suitable for real-time applications. In addition, 
it can be observed that the benchmark estimator has a larger variance than our estimator. 

Estimation of time-varying spot volatility 

We now compare our estimator for time-varying spot volatility % j defined in (26 with a bench- 
mark estimator. The efficient log-prices are generated with respect to the time-varying volatility 
given by the gray dashed lines in Figure [4} The first case (upper plot) is more challenging 
while the second case (lower plot) is more realistic for a volatility curve in transaction time - 
see the real data example in Figure [6] In both cases we use for the initial price exppQj ~ 
^[50 - 0.005,50 + 0.005). Again transaction prices (observations) are obtained by rounding the 
efficient prices to the nearest cent. 15,000 transactions are generated which is typical for one 
trading day of a liquid stock. The particle filter is applied with N = 500 particles. Our estimator 
S tj uses the constant step size A obtained by minimizing (31 . Analogous to (32 we consider 
the benchmark estimator given by 

tf := {1 - A}(£? + max{0, 2f% }) + \{\ogy tj - logy^) 2 - max{0, 2$ } (33) 
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with 772 --{l-j^yfil^-jh (log%-logyt._J(logy t ._ 1 -log^._ 2 ). A is obtained by minimizing 
the criterion 

T-l 

m + max{0, 2^ } - (log yt J+2 - log y tj+1 f f (34) 

(the terms E^ + max{0,2r} 2 j } and (logy 4j+2 - logyt j+1 ) 2 are independent in the additive mi- 
crostructure noise model \ogY tj = X tj + U t . with U tj i.i.d. - thus by using (logy tj+2 - logy 4j+1 ) 2 



(34 becomes a decent estimate of the mean squared error (plus a term constant in A)). For fjf. 
we use the step sizes -p^ because rjf should be close to a constant function. 

All estimators use the true volatility as starting value. Typical outcomes of the estimators are 
given in Figure [4j Note that volatility is plotted (instead of squared volatility). In the second case 
(lower plot) a constant step size is clearly suboptimal. Therefore we also computed our estimator 
combined with the SAGES method for adaptive step size selection tf. as described in Section jj 
S;?. is calculated using K = 15 step sizes ranging from 0.05 to 0.00005 (equally spaced). In the 
first case (upper plot) the estimator Ef. didn't give better results than the estimator T, tj and is 
therefore omitted. We also tried to use the SAGES method for the benchmark estimator. This 
gave surprisingly bad results which are not reported here. 

Because the true E(i,-) is known we can compute the mean squared error Ej^ 1 (E(^)-E(tj)) 2 
for the estimators which gives 1.21 x icr 18 and 1.34 x icr 18 for % j and E^., respectively, for 
the upper plot in Figure [4 For the estimators %., E^., and E^. in the lower plot we obtain 
8.59 x icr 19 , 7.52 x 10~ 19 , and 2.55 x icr 18 . In both plots, our estimators significantly outperforms 
the benchmark estimator. 

The general impression from Figure [4] is that the estimates are undersmoothed. This is in 
part due to the on-line procedure which corresponds to a one-sided kernel. Additional variability 
comes in from the particle filter where the estimated covariance matrix is used instead of the true 
one. We feel that our estimates come close to the need of practitioners who like to have a quickly 
reacting estimate and who prefer to correct undersmoothed estimates by "eye inspection". 

5.2 Results for Real Data 

We use stock data from the TAQ data base. Transactions and market maker quotes of the symbol 
C (Citigroup) for the 3rd September 2007 were extracted from the TAQ data base. To improve 
the data quality we carried out the following data cleaning and transformation. 

Cleaning A: Delete all transactions (quotes) with time stamps outside the main trading period 
(9:30 AM to 4 PM). 

Cleaning B: Delete all transactions (quotes) that are not originating from the NYSE. 

Cleaning C: Delete all transactions with abnormal sale condition or corrected prices (see the 
TAQ User's Guide for details). 
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Data transformation: If multiple transactions have the same time stamp (after the data clean- 
ing) apply the following transformation. Assume tj = t j+x = ... = tk-i ^ t k . Replace U by 

t\ = tj + (I - j)(t k - tj)/(k - j) for I = j + 1, . . . , k - 1. 

After the data cleaning 16,287 transactions remained. The transformation replaces identical 
time stamps with time stamps that are equally spaced. This transformation is necessary because 
the time stamp precision of our data is limited to one second. We mention that the transformation 
is only required for clock time volatility estimation. If the volatility is estimated in transaction time 
then the times stamps are irrelevant (only the order of the transactions matters). 

Unfortunately, the quality of the TAQ data is to poor to match easily the transactions with the 
market maker quotes. Note that it is necessary for our method that the transaction and quote 
data are perfectly matched. Therefore, our simulations are mainly focused on transaction data. 

Estimation results for real market maker quotes 

In order to show how our method works in the case when market maker quotes are available (Ex- 
ample 3 in Section|2]) we matched by hand (through an adjustment of the time stamps) the quotes 
and transactions of symbol C for a fraction of the trading day. As mentioned earlier, the quality 
of our data is to poor to do this automatically. Our particle filter is used with N = 5, 000 particles 
to estimate the filtering distributions of the unknown efficient (log-)prices. Figure [T] gives kernel 
density estimates of filtering distributions of some efficient prices which are computed based 
on the particle approximations. The market maker quotes, the transaction prices, and supports 
of the filtering distributions are also shown. From the figure it can be seen that some filtering 
distributions are highly skewed. In addition, consecutive zero returns lead to very uninformative 
filtering distributions (see transactions 2,300 through 2,309). 

Estimation results for real transaction data 



We apply our estimators E tj and £f. with N = 500 particles and the benchmark method Ef. (33 
to estimate the spot volatility for C. An initial volatility of 0.0005 is used. 

The transaction data of C and the volatility estimators are shown in Figure [6} At the beginning 
of the trading day the volatility is large and highly varying. Later, the volatility settles down and 
seems to be almost constant. Therefore, the SAGES method for localized step size selection is 
clearly advantageous compared to fixed step sizes. Again the benchmark estimator is rougher 
than our estimators. Practically, the volatility in transaction time is almost constant after 1 0:00 AM 
which in our experience is a typical feature of transaction data of liquid stocks. 

Clock time spot volatility estimation 

We now compare our two approaches for the estimation of spot volatility in clock time proposed 



in Section 



3.4 



The first estimator Sf 5 we consider is the estimator £5 combined with the SAGES 
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Figure 6: Real data example: Estimation of time-varying spot volatility in transaction time. The upper plot shows 
the transaction data of the symbol C for the 3rd September 2007. The middle and the lower plot give the volatility 
estimators E tj (black line), Ef. (red line), and the benchmark estimator Ef. (gray line). 



method (black line in Figure [7). Because the spot volatility in clock time is more volatile than in 
transaction time SAGES requires larger step sizes. We use step sizes equally spaced between 



0.3 and 0.003. The second estimator is the alternative estimator (tj) = £>f./6j with the trans- 
action time estimator t,f. from Figure [6 (red line). For the duration estimator 5j we determined 
the stepsize by minimizing the prediction error T,J~^{5j - (t j+1 -tj)} 2 leading to A = 0.1025. (Be- 
cause of the dependence of the durations 5j and - tj) usually are not independent and the 
minimization of the above criterion therefore is not approximately the same as the minimization 
of the mean squared error. Despite of this we think that the resulting A is reasonable.) 

The estimation results are provided in Figures [TJ and |8j First we state that both estimators 
roughly coincide in magnitude (which was not clear beforehand). From the upper plot of Figure [TJ 
we observe that (black line) produces some large spikes during the trading day (due to small 
values of tj - tj-i). The variability of S^(^) = £^./<5j is mainly a result of the variability of 
the duration estimator 8j (plotted in the lower plot) because the transaction time estimator Ef, is 
almost constant (apart from the beginning of the trading day - see Figure [6]). The fluctuation of 
the duration estimator is very high during the whole day. 
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Figure 7: Real data example: Estimation of time-varying spot volatility in clock time based on the transactions of 
symbol C for the 3rd September 2007. The upper plot gives the volatility estimators Et 3 s (black line) and Eaiffe) (red 
line). The middle plot shows the estimators for a fraction of the trading day. The averaged duration times 5j (for a 
fraction of the trading day) are given in the lower plot (the y-axis shows seconds). 

Figure [8] compares the transaction data and the volatility estimates for a small time period. 
The different behavior of the two estimators is apparent. We regard the strong spikes of S^ 5 as 
artificial due to small values of tj - tj-%. Note that the estimator needs about one minute to settle 
down again after the occurrence of a spike. On the other hand the small spikes of S^(iy) are 
caused by small averaged durations. For this reason we have more confidence in the second 
estimator. In addition, it is theoretically more appealing (because the transaction time volatility is 
almost constant and the variability of the clock time volatility is mainly caused by the variability of 
the trading intensity). 

The second estimator is also more stable for another reason: Because volatility in transaction 
time is less varying the particle filter in transaction time is more stable. 

6 Concluding Remarks 

We have presented a new technique for the on-line estimation of time-varying volatility based 
on noisy transaction data. The algorithm updates the volatility estimate immediately after the 
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Figure 8: Real data example: Estimation of time-varying spot volatility in clock time based on the transactions of 
symbol C for the 3rd September 2007. The figure only gives the results for a small fraction of the trading day (compare 
Figure[7|. The plots show (from top to bottom): transaction prices of C; our volatility estimators t,if (black line) and 
Egjf (tj) (red line); the averaged duration times 5j. 



occurrence of a new transaction, and it therefore is as close to the market as possible. It is 
straightforward to extend our method to more complicated price models (e.g. with a drift term) or 
other microstructure noise models. 

Our work was guided by the goal to execute all calculations on-line in a high-frequency situa- 
tion, and, at the same time, to base all methods on solid statistical principles. We feel that this 
goal has been achieved: Our algorithm is computationally efficient and it can be applied in real- 
time. On a recent personal computer an efficient implementation of our method requires a few 
milliseconds for a single update of the estimator (including one iteration of the particle filter with 
500 particles). We use particle filters in nonlinear state space models and EM-type algorithms. 

The contribution of this work is manifold. First, we have proposed a nonlinear market mi- 
crostructure noise model that covers bid-ask bounces, time-varying bid-ask spreads, and the 
discreteness of prices observed in real data. Second, the problem of on-line volatility estimation 
has been treated in a nonlinear state-space framework. The filtering distribution of the efficient 
price can be approximated with a particle filter and the volatility can be estimated as a param- 
eter of the filtering distribution. Third, we have presented a new sequential EM-type algorithm 
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which allows the on-line estimation of time-varying volatility. The usefulness of the approach for 
real-time applications has been demonstrated through Monte Carlo simulations and applications 
to stock data. 

Besides the new microstructure noise model we make a clear distinction between the (spot) 
volatility per time unit S c (t) and the volatility per transaction Volatility in clock time usually 
is much more volatile than volatility in transaction time. We advocate the use of transaction 
time for modeling, i.e. to estimate E(i), together with a subsequent transformation based on 
the trading intensity to obtain an estimator for S c (i). At least for our data sets it turned out that 
volatility in transaction time is almost constant (apart from the beginning of the trading day) and 
the fluctuation of clock time volatility is merely a result of fluctuation of the trading intensity. Thus 
a new focus in volatility estimation may be on the modeling of trading times. 

The empirical finance literature suggests that the bias induced by market microstructure ef- 
fects makes the most finely sampled data unusable, and most authors prefer to use log-returns 
with longer lags. Using log-returns with longer lags merely reduces the impact of microstructure, 
rather than quantifying and correcting its effect. It is obvious that a very good description and re- 
moval of microstructure effects allows for using log-returns with lag 1 leading to the most efficient 
estimate. Our estimates in |22) or |29) use log-returns with lag 1 but they can easily be modified 
to lag k by setting (in the transaction and continuous time model respectively) 

1 N N ( x c * — x ci ) ( x ci — x ci ) T 

i=l i=l 3 \ 

in case the microstructure correction with the state space model turns out to be insufficient for 
certain securities (with the same recursions <[25j> and |26|)). 

Of course it is desirable to have a complete mathematical theory on the methods of this paper. 
However, we think that this is very hard to achieve. Mathematically exact are the results on 
the particle filter given that the true volatility is known (i.e. with s£. = E tj ) - in particular the 
results from Proposition [T] on the optimal proposal and the importance weights. This means that 
the particle filter determines correctly the conditional distribution of the efficient prices given the 
observations. Even in the case of constant volatility and for the simplest estimator %. from 1 25 lit 
seems to be very difficult to establish consistency and the asymptotic distribution of the volatility 
estimator. In the simpler context of i.i.d. -observations convergence properties of recursive EM- 
type algorithms have been studied in Titterington (1984), Sato (2000), Wang and Zhao (2006), 
and Cappe and Moulines (2009) where also proofs of consistency and asymptotic normality are 
provided. Furthermore, there are several open mathematical questions on the interplay between 
transaction time volatility and clock time volatility. 

Acknowledgement: We are very grateful to the associate editor and two anonymous referees 
whose comments helped to improve the paper considerably. 

Disclaimer: The views expressed here are those of the authors and not necessarily those of its 
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